More than Bag-of-Words: Sentence-based Document Representation for Sentiment Analysis

نویسندگان

  • Georgios Paltoglou
  • Mike Thelwall
چکیده

Most sentiment analysis approaches rely on machine-learning techniques, using a bag-of-words (BoW) document representation as their basis. In this paper, we examine whether a more fine-grained representation of documents as sequences of emotionally-annotated sentences can increase document classification accuracy. Experiments conducted on a sentence and document level annotated corpus show that the proposed solution, combined with BoW features, offers an increase in classification accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Emotion Classification of Chinese Microblog Text via Fusion of BoW and eVector Feature Representations

Sentiment Analysis has been a hot research topic in recent years. Emotion classification is more detailed sentiment analysis which cares about more than the polarity of sentiment. In this paper, we present our system of emotion analysis for the Sina Weibo texts on both the document and sentence level, which detects whether a text is sentimental and further decides which emotion classes it conve...

متن کامل

Evaluation of a General-Purpose Sentiment Lexicon on a Product Review Corpus

This paper introduces a new general-purpose sentiment lexicon called the WKWSCI Sentiment Lexicon and compares it with three existing lexicons. The WKWSCI Sentiment Lexicon is based on the 6of12dict lexicon, and currently covers adjectives, adverbs and verbs. The words were manually coded with a value on a 7-point sentiment strength scale. The effectiveness of the four sentiment lexicons for se...

متن کامل

The Role of Knowledge-based Features in Polarity Classification at Sentence Level

Though polarity classification has been extensively explored at document level, there has been little work investigating feature design at sentence level. Due to the small number of words within a sentence, polarity classification at sentence level differs substantially from document-level classification in that resulting bag-of-words feature vectors tend to be very sparse resulting in a lower ...

متن کامل

Compact Features for Sentiment Analysis

This work examines a novel method of developing features to use for machine learning of sentiment analysis and related tasks. This task is frequently approached using a “Bag of Words” representation – one feature for each word encountered in the training data – which can easily involve thousands of features. This paper describes a set of compact features developed by learning scores for words, ...

متن کامل

Rich Document Representation for Document Clustering

In traditional document clustering models, a document is considered as a bag of words. In this paper we present a new method for generating feature vectors, using the sentence fragments that are called logical terms and statements, in PLIR system. PLIR is a Knowledge-Based Information system based on the theory of the Plausible Reasoning. We have conducted a number of experiments using OHSUMED ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013